15 research outputs found

    Explotando jerarquías de memoria distribuida/compartida con Hitmap

    Get PDF
    Actualmente los clústers de computadoras que se utilizan para computación de alto rendimiento se construyen interconectando máquinas de memoria compartida. Como modelo de programación común para este tipo de clústers se puede usar el paradigma del paso de mensajes, lanzando tantos procesos como núcleos disponibles tengamos entre todas las máquinas del clúster. Sin embargo, esta forma de programación no es eficiente. Para conseguir explotar eficientemente estos sistemas jerárquicos es necesario una combinación de diferentes modelos de programación y herramientas, adecuada cada una de ellas para los diferentes niveles de la plataforma de ejecución. Este trabajo presenta un método que facilita la programación para entornos que combinan memoria distribuida y compartida. La coordinación en el nivel de memoria distribuida se facilita usando la biblioteca Hitmap. Mostraremos como integrar Hitmap con modelos de programación para memoria compartida y con herramientas automáticas que paralelizan y optimizan código secuencial. Esta nueva combinación permitirá explotar las técnicas más apropiadas para cada nivel del sistema además de facilitar la generación de programas paralelos multinivel que adaptan automáticamente su estructura de comunicaciones y sincronización a la máquina donde se ejecuta. Los resultados experimentales muestran como la propuesta del trabajo mejora los mejores resultados obtenidos con programas de referencia optimizados manualmente usando MPI u OpenMP.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Máster en Investigación en Tecnologías de la Información y las Comunicacione

    Easing parallel programming on heterogeneous systems

    Get PDF
    El modo más frecuente de resolver aplicaciones de HPC (High performance Computing) en tiempos de ejecución razonables y de una forma escalable es mediante el uso de sistemas de cómputo paralelo. La tendencia actual en los sistemas de HPC es la inclusión en la misma máquina de ejecución de varios dispositivos de cómputo, de diferente tipo y arquitectura. Sin embargo, su uso impone al programador retos específicos. Un programador debe ser experto en las herramientas y abstracciones existentes para memoria distribuida, los modelos de programación para sistemas de memoria compartida, y los modelos de programación específicos para para cada tipo de co-procesador, con el fin de crear programas híbridos que puedan explotar eficientemente todas las capacidades de la máquina. Actualmente, todos estos problemas deben ser resueltos por el programador, haciendo así la programación de una máquina heterogénea un auténtico reto. Esta Tesis trata varios de los problemas principales relacionados con la programación en paralelo de los sistemas altamente heterogéneos y distribuidos. En ella se realizan propuestas que resuelven problemas que van desde la creación de códigos portables entre diferentes tipos de dispositivos, aceleradores, y arquitecturas, consiguiendo a su vez máxima eficiencia, hasta los problemas que aparecen en los sistemas de memoria distribuida relacionados con las comunicaciones y la partición de estructuras de datosDepartamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    A Technique to Automatically Determine Ad-hoc Communication Patterns at Runtime

    Get PDF
    Producción CientíficaCurrent High Performance Computing (HPC) systems are typically built as interconnected clusters of shared-memory multicore computers. Several techniques to automatically generate parallel programs from high-level parallel languages or sequential codes have been proposed. To properly exploit the scalability of HPC clusters, these techniques should take into account the combination of data communication across distributed memory, and the exploitation of shared-memory models. In this paper, we present a new communication calculation technique to be applied across different SPMD (Single Program Multiple Data) code blocks, containing several uniform data access expressions. We have implemented this technique in Trasgo, a programming model and compilation framework that transforms parallel programs from a high-level parallel specification that deals with parallelism in a unified, abstract, and portable way. The proposed technique computes at runtime exact coarse-grained communications for distributed message-passing processes. Applying this technique at runtime has the advantage of being independent of compile-time decisions, such as the tile size chosen for each process. Our approach allows the automatic generation of pre-compiled multi-level parallel routines, libraries, or programs that can adapt their communication, synchronization, and optimization structures to the target system, even when computing nodes have different capabilities. Our experimental results show that, despite our runtime calculation, our approach can automatically produce efficient programs compared with MPI reference codes, and with codes generated with auto-parallelizing compilers.2018-12-01MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H6 (TIN2016-81840- REDT), COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS), and by the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETACIEMAT belongs to CIEMAT and the Government of Spain

    Controllers: an abstraction to ease the use of hardware accelerators

    Get PDF
    Producción CientíficaNowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.2019-01-01MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

    Multi-Device Controllers: A Library To Simplify The Parallel Heterogeneous Programming

    Get PDF
    Producción CientíficaCurrent HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, in many cases, portability compromises the efficiency on different devices, and there are details concerning the coordination of different types of devices that should still be tackled by the programmer. In this work, we introduce the Multi-Controller, an abstract entity implemented in a library that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying data partition, mapping and the transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA’s GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing data movements and hiding the launch details. The results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and highly efficient programs that adapt to the heterogeneous environment.2020-01-012020-01-01MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H6 (TIN2016-81840-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

    The Interactivity of a Virtual Museum at the Service of the Teaching of Applied Geology

    Full text link
    [EN] In a framework in which teaching practice is a dynamic process, predisposed to continuous innovation, the Geological Collection of the University of León (CGULe), with 2000 copies of minerals, rocks and fossils, offers an opportunity for teaching innovation, in relationship with subjects of the geological disciplines that are taught in the Degrees of Mining Engineering and Energy Engineering. At http://laboratorio.wesped.es/, the first phase of development of the Virtual Museum of the CGULe is shown, where information and images of minerals and mineral deposits from León are offered. Likewise, videos of tests of characterization of minerals, made by students as a practice of the subject "Mineralogy and Petrography" (Degree in Mining Engineering), are offered as part of a teaching innovation. This teaching innovation was evaluated in two ways: a) comparing the academic results of students in this practice with equivalent results from previous courses and b) conducting a satisfaction survey. Given the small number of students who participated in this experience, the results of this evaluation are inconclusive. For this reason, teacher innovation will be extended in time and will be extended to other subjects of the above mentioned degrees.The work was partially funded by the project "Design of practical teaching-learning experiences in relation to the virtual musealization of the Geological Collection of the University of León", in the framework of the Support Plan for Teaching Innovation of the University of León (PAID & PAGID 2016). Likewise, the authors express their gratitude to D. Luis Armando Conejo Lombas, donor of copies to the Geological Collection of the University of Leon, as well as photographs of deposits to the Virtual Museum, and to the collaborators: Mr. Ángel Díez Bragado, Mr. Jesús García del Canto, Mr. Manuel Urcera Valladares and Mr. Guillermo Salazar Brugos.Gómez-Fernández, F.; Fernández-Raga, M.; Alaiz-Moretón, H.; Castañon-García, A.; Palencia, C. (2017). The Interactivity of a Virtual Museum at the Service of the Teaching of Applied Geology. En Proceedings of the 3rd International Conference on Higher Education Advances. Editorial Universitat Politècnica de València. 712-719. https://doi.org/10.4995/HEAD17.2017.5366OCS71271

    Computational and Mathematical Methods in Science and Engineering (CMMSE)

    Get PDF
    Producción CientíficaCurrently, the generation of parallel codes which are portable to different kinds of parallel computers is a challenge. Many approaches have been proposed during the last years following two different paths. Programming from scratch using new programming languages and models that deal with parallelism explicitly, or automatically generating parallel codes from already existing sequential programs. Using the current main-trend parallel languages, the programmer deals with mapping and optimization details that forces to take into account details of the execution platform to obtain a good performance. In code generators from sequential programs, programmers cannot control basic mapping decisions, and many times the programmer needs to transform the code to expose to the compiler information needed to leverage important optimizations. This paper presents a new high-level parallel programming language named CMAPS, designed to be used with the Trasgo parallel programming framework. This language provides a simple and explicit way to express parallelism in a highly abstract level. The programmer does not face decisions about granularity, thread management, or interprocess communication. Thus, the programmer can express di erent parallel paradigms in a easy, uni ed, abstract, and portable form. The language supports the necessary features imposed by transformation models such as Trasgo, to generate parallel codes that adapt their communication and synchronization structures for target machines composed by mixed distributed- and shared-memory parallel multicomputers.Ministerio de Econom a y Competitividad (Spain) and the ERDF program of the European Union: CAPAP-H5 network (TIN2014- 53522), MOGECOPP project (TIN2011-25639), HomProg-HetSys project (TIN2014-58876- P); the Junta de Castilla y Le on (Spain): ATLAS project (VA172A12-2)

    HiPEAC 2017 Workshop on High-Level Parallel Programming for GPUs (HLPGPU)

    No full text
    Producción CientíficaCurrent HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming mixing accelerators and CPU cores. However, the portability compromises in many cases the efficiency on different devices, and there are details about the coordination of different types of devices that should be still tackled by the programmer. In this work we introduce the Multi-Controler (MCtrl), an abstract entity implemented in a library, that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying the data partition, mapping, and transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA’s GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing the data movements, and hiding the launching details. Results of an experimental study with four study cases indicates that our abstraction allows the development of flexible and high efficient programs, that adapt to the heterogeneous environment.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

    5th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'2017)

    No full text
    Producción CientíficaWe propose to move to runtime part of the compile-time analysis needed to generate the communication code for distributed-memory systems. Communication stages on distributed-memory systems have a significant impact on performance, thus the reduction of the communication times is key for improving performance in terms of runtime execution. We have developed a technique that uses a hierarchical tiling array library to represent and manage rectangular index spaces at runtime. The data to be received and/or sent by a local process to another one is calculated by intersecting the set of indexes read or written by a process with the set of indexes written or read by the local process

    Compilers for Parallel Computing (CPC)

    No full text
    Producción CientíficaCurrent multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design. Programming in this kind of environment is challenging. Many successful parallel programming models and tools have been proposed for specific environments. However, the application programmer still faces many important decisions not related with the parallel algorithms, but with implementation issues that are key for obtaining efficient programs. For example, decisions about partition and locality vs. synchronization/communication costs; grain selection and tiling; proper parallelization strategies for each grain level; or mapping, layout, and scheduling details. Moreover, many of these decisions may change for different machine details or structure, or even with data sizes. This paper presents an automatic code generation system for mixed distributed- and shared-memory parallel multicomputers. We present an extension of the Trasgo programming model. This extended model supports a wider range of parallel structures and applications where coordination is expressed at an abstract level. Transparent modular objects are invoked to guide the partition and mapping of both data and processes, across the whole system. We present a technique that, for affine expressions, compute exact aggregated communications at the distributed level. It uses intersection of remote and local footprints in terms of the mapping policies selected. Moreover, Trasgo 2.0 integrates polyhedral analysis tools to obtain optimizations inside each shared-memory parallel node at the shared level. This approach allows to automatically generate mul- tilevel parallel programs that adapt their communication and synchronization structures to the target machine. Our experimental results for both, shared- and distributed-memory environments, show how this approach can automatically produce efficient codes when compared with manually-optimized codes using MPI or OpenMP models
    corecore